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Background / Context: 

Cluster randomized trials (CRTs), or studies in which intact groups of individuals are randomly 
assigned to a condition, are becoming more common in the evaluation of educational programs, 
policies, and practices. For example, a search on the website for the Institute of Education 
Sciences (IES) suggests that the National Center for Educational Research (NCER) funded 
around 7 randomized trials in 2004. The same search for 2011 yielded around 36 funded 
randomized trials, or approximately 5 times as many trials. The website for the National Center 
for Education Evaluation and Regional Assistance (NCEE) reveals they have launched over 30 
evaluation studies in the past decade, the majority of them utilizing a randomized trial. Clearly 
there are a large number of randomized trials of educational programs, policies, and practices 
either complete or currently in the field. 

The overarching goal of these randomized trials is generate rigorous evidence of whether or not a 
program works. In statistical terms, this is often referred to as the main effect of treatment. In the 
past 15 years, the field has made substantial progress in terms of how to design CRTs and how to 
calculate the statistical power for the main effect of treatment. However, designing a study to 
detect the main effect of treatment may not be sufficient. It is quite reasonable that context 
matters in these studies and thus designing studies to examine for whom and under what 
conditions a program is effective is critical. In order to do this, studies must also be designed to 
detect moderator effects. The power to detect moderator effects at the student, cluster, or site 
level in CRTs has received much less attention in the literature than the power for the main effect 
of treatment. Some of the recent work includes: Bloom (2005) provides power calculations for 
individual and cluster level moderators in a 2-level CRT; Raudenbush and Liu (2000) show 
power calculations for site-level moderator effects for multisite trials in which individuals are 
randomly assigned within sites; Hedges and Pigott (2004) discuss power calculations for 
moderator effects in the context of a meta-analysis which has direct comparisons to multisite 
cluster randomized trials; and Spybrook (in press) examines power for individual and cluster 
level moderators for CRTs but not site level moderators. 

Purpose / Objective / Research Question / Focus of Study: 

The purpose of this paper is to extend the work on power calculations for moderator effects to 
include moderator effects at any level for the following 4 types of CRTs: 2-level CRT, 3-level 
CRT, 3-level multisite cluster randomized trial (MSCRT), and 4-level MSCRT. In addition to 
providing the calculations and R code to do the calculations, we start to develop intuition around 
the minimum detectable effect size for moderator effects using sample sizes from CRTs in the 
field of education. 

Significance / Novelty of study: 

This paper represents the next step towards building the capacity of researchers to design CRTs 
that move beyond the main effect of treatment. The three primary power programs for CRTs, 
Optimal Design Plus (Raudenbush, Spybrook, Congdon, Liu, Martinez, Bloom, & Hill, 2011), 
CRT Power (Borenstein & Hedges, 2011), and Power UP (Dong & Maynard, 2013 ) do not 
routinely allow users to calculate power for moderator effects. The calculations and R code in 
this paper provide an accessible resource for researchers calculating power for moderator effects. 
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Statistical, Measurement, or Econometric Model: 

Given the space limitations in this proposal, we provide the models for the 3 -level MSCRT and 
the power calculations for the main effect of treatment and a site level moderator. Details for all 
of the designs will be provided in the full paper. 

Main effect of Treatment 

We begin with the power for main effect for treatment, since this is typically the primary effect 
of interest and provides intuition for the calculations for moderator effects. Suppose we have a 
study in which schools are the unit of randomization, but they are blocked by district. That is, 
within each district, schools are randomly assigned to condition and students are nested within 
schools, a 3-level MSCRT. In the case of no moderator, the student level model is: 

Y ijk = n o jk + e nk e ijK ~ N(0, a 2 ) [ 1 ] 

for i e {1,2,..., ft} persons per cluster, j e {1,2,..., 7} clusters and k e {1,2,..., /C} sites, 

where 7r 0jk is the mean for cluster j in site k; e ijk is the error associated with each person; and cr 2 
is the within-cluster variance. 

The level-2 model, or cluster-level model, is: 

71 Ojk = Pmk + Po\k^jk + r Ojk r 0jk ~ ^(O’Cr) [2] 

where f3 00k is the mean for site k\ J3 0lk is the treatment effect at site k; W jk is a treatment contrast 
indicator, Vi for treatment and -Vi for the control; r 0jk is the random effect associated with each 
cluster; and r K is the variance between clusters within sites. 

The level-3 model, or site-level model, is: 

A)0 k ~ YoOO + U 00 k var ( U 00k ) ~ T /3 00 

Am- — /oio + w or var (n ou .) ~ t p ^ cov(w 00/ , ,u 0]k ) = [3] 

where ;/ 000 is the grand mean; ;/ 0 , 0 is the average treatment effect (“main effect of treatment”); 

u 00k is the random effect associated with each site mean; u 0ik is the random effect associated 
with each site treatment effect; z is the variance between site means; r /; is the variance 
between sites on the treatment effect; and r p is the covariance between site-specific means and 

site-specific treatment effects. Note that we allow the treatment effect to vary randomly across 
sites, however, we could also treat this as a fixed effect. 

The estimate of the treatment effect and the variance of the estimated treatment effect are:: 

Yo\o = Ye ~ Y c 
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[4] 


The power for the test for the main effect of treatment for the 3-level MSCRT, H 0 : y 010 = 0 , 
follows the same logic as the power for the main effect of treatment for the 2 -level CRT 
(Raudenbush, 1997). The ^statistic in this case though is a ratio MS tr eatment to the 
MStreatmentbyduster- The ratio of expected mean squares is equivalent to 1+ X , where the 
noncentrality parameter is defined as: 

, S 2 

/n)/J\/K [cj] +A(p + {\- p)l n)l J\l K 


A = 


T Pu +4 ( ; 


/oio 


r + CT 


where S = . ^ 01 ° - , p = — 

We standardize the parameters by the sum of the within site variance. As the noncentrality 
parameter increases, the power of the test increases. 


^ 2 = 


Z" + a 


Site Moderator Effects 


As mentioned above, given the space limitations for the proposal, we provide the models with a 
site level moderator. Level- 1 and level 2 remain the same as equations 1 and 2. However, in level 
3 we assume there are an equal number of rural and urban sites (site type). 

The level-3 model, or site-level model, is: 


Poo k Vooo + Too A + U 00 k 

var (u 00k ) T p 0f)s 


Pou = Toio + ToiPa- + U 0 l k 

var (u 0lk ) ~ t p 

[6] 


where ;/ 000 is the grand mean; y 00l is the effect of site type on the mean; y 010 is the average 

treatment effect; y () l , is the site type (urban or rural) by treatment interaction; Su is site contrast 

indicator, Vi for rural and -Vi for urban; u m . is the random effect associated with each site mean; 

u m is the random effect associated with each site treatment effect; z p is the residual variance 

between site means; r n is the residual variance between sites on the treatment effect; and r„ is 

the residual covariance between site-specific means and site-specific treatment effects. Note that 
we allow the treatment effect to vary randomly across sites, however, we could also treat this as 
a fixed effect. 

The power for the test of the site-level moderator for the 3-level MSCRT, H 0 : y ou = 0 , follows 

the same logic as the power for the main effect of treatment. However the noncentrality 
parameter in this case is: 

2^ — _S VH~\ 

pfjJ + 16(p + (1 - p) / n )/ /]/ K 
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Note that 8 S is simply the standardized site moderator effect and cl is the residual variance in 

the treatment effect across sites. A quick comparison of the noncentrality parameters associated 
with the test for the main effect of treatment (equation 5) and the site moderator effect (equation 
7) reveal the following. The site level moderator does reduce the site by treatment variance, crj |s , 

however, the within site variance, \p + (1 - p) / n\l 7 is 4 times larger for the site moderator 
effect than in the case for the main effect of treatment. This suggests much larger sample sizes 
are needed to detect site moderator effects. 

Findings / Results: 

Larger sample sizes are required order to detect a site level moderator compared to the main 
effect of treatment. Although not shown in this proposal, this finding also holds for a cluster 
level moderator. However, for CRTs, the power to detect an individual level moderator may be 
larger than the main effect of treatment because the number of individuals is the most influential 
sample size for detecting individual level moderator effects (Bloom, 2005; Spybrook, in press). 

Conclusions: 

Designing studies to detect not only whether or not an intervention works, but for whom or under 
what circumstances is critical. The results from this study suggest that in many cases, if a study 
is powered to detect a reasonable main effect of treatment and it has a reasonable number of 
individuals per cluster, then it will also be powered to detect an individual level moderator 
(although not shown in this proposal). However, the sample size requirements for adequate 
power to detect a cluster level moderator or site level moderator exceed the sample size 
requirements to power a study to detect a main effect of treatment. This presents an important 
challenge to researchers designing CRTs and to funding agencies supporting CRTs. For the 
researchers, it is critical to perform these calculations so that they are aware of whether or not 
they are powered to detect moderator effects. For funders, this suggests that given current levels 
of funding, studies may not be able to detect more than the main effect of treatment. 
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